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This  paper  describee  a  set  of  measurements 
of  the  memory  reference  patterns  of  some  pro¬ 
grams.  The  technique  wed  to  obtain  these 
measurements  is  unusually  efficient.  The  data 
is  presented  in  graphical  form  to  allow  the 
reader  to  *see*  bow  the  program  uses  memory. 


Constant  use  of  a  page  and  sequential  access  of 
memory  are  easily  observed.  An  attempt  is 
made  to  classify  the  programs  based  on  their 
referencing  behavior.  From  this  analysis  it  is 
hoped  that  the  reader  will  gain  some  lmights  as 
to  the  effectiveness  of  various  memory  manage¬ 
ment  policies. 


Performance  and  memory  reference  pat¬ 
terns  of  programs  is  an  issue  that  has  bean  stu¬ 
died  for  many  years.  Over  time,  however,  the 
basic  assumptions  for  the  memory  manage¬ 
ment  algorithms  and  the  chareeuvlsstlm  of 
the  programs  run  undsr  them  algjrithms  ean 
changt.  This  can  be  due  to  technological 
advances  or  to  changos  in  the  workload  an  the 
computers.  This  paper  shows  the  potterne  of 
reference  of  several  programs.  Soma  of  the 

K grams  see  large  in  that  they  consume  much 
e  and  memoiy.  Other  programs  are 
Isshided  because  they  are  used  tragutatly  in 
the  environment  being  considered.  From  this 
study  it  it  hoped  that  the  reader  ean  gat  a 
hotter  insight  into  how  largo  programs  may 
reference  memory.  Of  particular  Interest  art 
the  results  that  the  Influence  of  data  pages  far 
exceeds  that  of  code  peges  for  largo  programs, 
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and  that  radical  changes  in 
trad  to  bo  quits  infrequent. 


program  locality 


Hit  motivation  for  this  work  was  the  desire 
to  understand  the  impact  on  memory  manage¬ 
ment  duo  to  the  Increase  in  die  of  applications 
running  on  computers.  Examples  of  these 
applications  era  VLSI  design  aids,  image  pro- 
esaring  and  symbolic  computation.  These  pro¬ 
grams  are  much  too  large  to  study  by  simula¬ 
tion:  they  may  run  for  a  day  and  uee  many 
megabytes  of  memory. 

The  tracing  technique  works  oe  follows. 
Periodically,  the  operating  system  Invalidates 
an  page  table  entries  of  s  user  process  being 
traced.  The  operating  system  records  the 
information  about  each  page  fault  which  occurs 
in  an  internal  buffer.  A  second  user  process 
eopies  data  frem  the  buffer  to  secondary 
stonga.  After  tracing,  the  faulted  page  would 
thenbe  validated  or  paged  In  as  appropriate. 
Providad  that  the  period  between  invalidations 
is  chosen  appropriately,  the  groat  majority  of 
instructions  do  net  fault  and  nance  the  meas¬ 
ured  procsss  is  executed  at  -snarly  full  speed. 
A  dew  deem  factor  in  the  program's  execution 
of  About  two  to  four  is  tyfdcsl.  This  speed  is 
vastly  better  than  that  is  achieved  by  conven¬ 
tional  Emulation.  The  record  on  oeeondory 
■tango  gives  au  pages  touched  by  the  process 
during  the  period  between  invalidations.  This 
method  is  somewhat  similar  to  the  typical 
evaporation  between  a  debugger  and  the 
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1b  gala  insight,  a  series  of  studies  was  per¬ 
formed  to  detect  how  real  programs  use 
memory,  lbs  principle  results  presented  here 
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present  strip  charts. 


These  studies  used  instrumented  simulators, 
however,  and  dealt  only  with  much  amallor  pro¬ 
grams  executing  for  much  aborted  periods  of 
urns  than  tboaa  rsportsd  bora.  Iba  previous 
papers  wore  not  content  to  analyse  the  strip 
charts  alone,  but  proceeded  to  process  the 
data  further. 

Four  basic  classifications  of  access  patterns 
bote  been  identified.  Each  typifies  tbs  access 
of  a  segment  of  memory  during  a  phase  of  exe¬ 
cution  of  a  program.  An  access  is  called  total  if 
nearly  every  page  in  its  logical  address  space  is 
touched  frequently.  An  access  is  ooquontial  it 
the  pages  are  used  in  an  ascending  or  descend¬ 
ing  sequence.  This  kind  of  program  would 


pages  are  never  used.  This  criterion  to 
clarification  is  reasonable  to  view  of  the  chang¬ 
ing  technology  of  the  last  few  yean.  Memories 
an  becoming  larger  and  less  costly,  while  the 
access  time  to  secondary  storage  has  been 
mostly  constant.  Hence,  ft  some  processor  or 
1/0  time  can  be  saved,  it  is  worth  the  waste  of  a 
few  pages  of  memory. 

All  the  programs  observed  an  real  applica¬ 
tions.  In  some  runs,  the  data  processed  by  the 
program  is  synthetic,  but  only  when  the 
access  patterns  an  not  data  dependent.  Some 
programs  an  large  by  most  Standards,  they 
use  up  to  3d  megabytes  of  data  and  could  run 
day  or  more  on  a  dedicated  machine.  One 


frequently  used  program:  the  parser-code 
rator.  optimiser  ana  loader  for  a  compiler. 
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No  processing  of  the  dote  vu  performed. 

The  final  component  of  the  tracing  package 
was  the  program  to  be  monitored.  Tne  traced 
program  was  usually  a  single  process.  The  pro¬ 
gram  source  was  modified,  by  the  insertion  of  a 
one-line  call,  to  execute  a  routine  to  setup  the 
measurement  tod.  At  a  roughly  periodic  rate 
in  real  time,  the  monitored  process  would  be 
interrupted.  Software  was  added,  by  inclusion 
of  some  object  code  at  load  time,  to  handle  this 
interrupt  This  software  issued  the  system  call 
that  caused  all  pages  of  the  process  to  be 
invalidated.  The  first  reference  to  each  pro¬ 
gram  or  data  page  after  this  would  cause  a 
fault.  The  fault  would  be  traced,  and  the  page 
would  either  be  paged  in  or  the  page  table 
entry  would  simply  be  validated.  Normally, 
only  a  single  fault  for  a  page  would  occur 
between  periodic  interrupts.  After  some 
analysis,  the  period  between  interrupts  was  set 
at  100  milliseconds. 

In  summary,  the  pages  touched  by  the 
monitored  process  in  each  100  milliseconds 
period  of  real  time  would  be  recorded. 

Several  minor  operating  system  specific 
comments  may  be  helpful  for  a  reader  already 
familiar  with  VMUNIX  First,  the  system  call 
that  invalidated  the  pages  was  an  extension  of 
the  "vadviae"  system  call.  Second,  the 
minimum  period  of  software  generated  inter¬ 
rupts  was  one  second.  This  was  modified  to 
allow  interrupts  to  occur  with  the  granularity  of 
the  line  frequency  (60  Hz  on  this  machine). 
Third,  a  process  was  allowed  to  "time  stamp" 
the  trace  buffer.  This  was  used  to  record  vir¬ 
tual  time  in  the  buffer  so  as  to  be  able  to  con¬ 
vert  the  trace  data  to  virtual  time.  Finally,  sys¬ 
tem  calls  were  added  to  allow  the  tracing  to  be 
turned  on  and  off. 

In  any  measurement  study,  overhead  is 
always  a  problem.  The  goal  is  this  work  was  to 
allow  substantial  overhead,  but  not  to  let  the 
overhead  dominate.  Most  processes  took  about 
two  to  four  times  the  execution  time  they  used 
when  net  traced.  This  could  have  been  substan¬ 
tially  lessened  by  invalidating  their  pages  every 
100  milliseconds  of  virtual  (instead^ of  real! 


included  simulators  of  FIFO.  LRU, 
algorithm  [Bel66l,  clustered  page-1 
or  the  above  and  sequential  detec 
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sequential  detection  prepag¬ 
ing  policies.  All  of  the  above  were  found  to  be 
unsatisfactory  analysis  toots  because  of  the 
number  of  parameters  ns  cess  ary  to  run  a  simu¬ 
lation.  A  cloud  of  numbers  was  generated. 
They  provided  very  little  insight  into  now  a  pro¬ 
gram  used  memory. 

The  method  of  analysis  selected  is  to 
present  the  page  reference  pattern  in  the 
visual  form  of  a  strip  chart.  Tach  scan  line 
represents  100  milliseconds  of  virtual  time. 
Tune  runs  down  the  page.  The  other  axis  is  the 

Bge  number.  Pages  are  1024  bytes  in  size. 

ch  dot  represents  one  or  more  references  for 
a  page  during  a  100  millisecond  period.  Refer¬ 
ences  to  the  stack  are  not  shown.  In  some 


cases,  only  a  fragment  of  the  chart  is 
presented.  The  charts  are  not  all  at  the  same 
magnification. 

In  the  analysis  of  the  programs,  all  conclu¬ 
sions  are  made  from  knowing  the  general  appli¬ 
cation  of  the  program  and  from  examination  of 
the  chart  They  have  not  been  verified  by 
examination  of  the  programs.  This  is  an  advan¬ 
tage  of  this  technique:  the  analysis  does  not 
require  detailed  knowledge  of  the  programs. 

Figure  1  represents  IMAGE  (called  exyiq  by 
its  author).  This  ie  a  program  that  demon¬ 
strates  certain  features  of  image  processing.  It 
converts  a  red-green-blue  style  image  to  a  y-i-q 
style  image.  It  is  written  in  C  and  has  15  pages 
of  program  (as  in  all  the  programs  examined, 


the  program  pmos  precede  data  pages  in  vir¬ 
tual  memory).  First  of  all.  notice  how  little  of 
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the  address  space  is  used  for  the  code  (less 


than  one  percent).  Second,  the  data  seems 
quite  well  organised  for  this  type  of  processing. 
Tbs  working  set  is  small  compared  to  the  total 
amount  of  data.  What  is  apparently  shown  here 
is  a  technique  commonly  used  when  processing 
two  (or  more)  dimensional,  non-sparee 
matrices  [McKMJ.  The  matrix  is  subdivided  in 
submatrices  by  cutting  the  matrix  vertically 
and  horisonlally.  By  adjusting  the  size  of  the 
submatrices,  a  row  or  a  column  of  the  originr’ 
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data  pages  previously  required  arc  no  longer 
needed  and  eU  the  unneeded  pages  are  now 
needed  teg.  all  ths  even  pages  are  no  longer 
needed  out  the  odd  numbered  pages  are  all 
now  nseded).  The  only  reasonable  way  to  run 
this  program  is  to  give  it  all  the  memory  it 
wants.  The  program  Is  (probably)  total  alter 
the  initialisation  phase. 

figure  4  tofcr  VAXUA.  This  is  a  VAX  exten¬ 
sion  of  MACSYMA  running  a  demonstration 
script.  This  program  does  symbolic  computa¬ 
tion  (eg.  integration,  equation  solving).  It  is 
written  primarily  in  USP.  but  has  some  C  prim¬ 
itives.  Because  of  the  list  structure  storage  of 
USP.  there  ia  no  distinction  between  code  and 
data.  It  is  easy  to  observe  that  there  is  some 
sequential  behavior  in  the  upper  addressee. 
Also.  many  pages  are  in  nearly  constant  use. 
However,  the  outlying  points  here  should  dom¬ 
inate  performance.  Each  of  them  represents  a 

iiotenUal  fault  for  a  memory  management  pel- 
cy.  This  figure  illustrates  a  potential  problem 
for  the  visual  analysis  technique:  small  outlying 
dots  tend  to  be  ignored.  VAX3MA  to  regular 
except  during  garbage  collection  (seconds  21- 
25)  when  it  is  total  for  certain  segments. 

Figures  5  and  6  are  for  SEARCH.  This  to  e 
FORTRAN  program  that  searches  deep  space 
photographic  plates  for  stars  and  galaxies.  It 
has  42  pages  of  program.  Figure  5  shows  the 
Aral  40  seconds.  The  pattern  established  in  the 
fifth  second  continues  up  through  the  four¬ 
teenth  minute  shown  in  Figure  6.  The  program 
rune  for  about  six  hours  loosely  repeating  the 
pattern  shown  in  minutes  fifteen  through 
seventeen-  It  to  doiiw  software  paging  of  a  36 
megabyte  disk  file.  Bata  that,  even  though  a 
definite  low  to  high  address  sweep  to  visible  in 
Figure  6.  virtually  all  pages  are  used  every  few 
eeco ode.  The MfiaontalUnes apparently  occur 
when  some  feature  to  detected  in  the  picture 
and  must  be  evaluated.  This  to  another  pro¬ 
gram  that  requires  that  all  Ha  pages  be  in 
memory  for  efficiency.  It  could  also  oe  argued 
that  this  program  to  regular  tones  the  periods 
of  total  use  or  memory  art  separated  by  a  few 
seconds.  Again,  this  to  a  technology  influenced 
ditUiicUon. 

This  addressing  pattern  to  ea  artifact  of  the 

into  the  address  space  instead  of  using 
software  paging.  If  flies  wort  coupled,  than  this 
program  weuMhavs  sequential  behavior  for  one 


or  two  segments. 
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The  three 


are  the  only  ones  that  use  toffiiflcant  amounts 
of  time.  It  should  be  emphasised  that  the 
structure  of  this  compiler  la  an  artifact  of  its 
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benefit  even  lose  by  cmtimlutione  based  on  this 
sequentiality.  CCOlf  should  be  considered 


total. 


The  code  optimizer  to  C2.  Its  plot  to  shown 
in  Figure  6.  It  uses  23  pagee  of  program. 
Although  part  of  the  memory  to  used  sequen¬ 
tially,  the  period  of  use  to  around  a  second. 
This  compilation  was  performed  mi  the  same 
program  used  for  CCOm.  C2  must  be  viewed  as 
total,  but  the  ttse  of  the  segment  changes. 

The  V1IUN1X  loader  to  LD.  Ha  plot  to  shown 
in  figure  9.  Note  that  there  are  two  dieses. 
This  plot  to  for  the  loading  of  the  V1IUN1X 
Operating  System  itself,  and  usee  more 
memory  and  runs  longer  then  a  typical  use  of 
LD.  Although  some  sequentiality  exists,  the 
program  seems  mostly  total. 

Notice  hew  densely  the  parte  of  a  "C  Com¬ 
pile"  (CCOM,  C2  and  LD)  use  the  ad  drees  space. 
Since  they  are  total,  for  those  processes  swap¬ 
ping  would  bo  more  effective  than  paging. 


This  paper  has  presented  a  technique  to 
study  the  memory  reference  behavior  of  pro¬ 
grams.  The  technique  to  fairly  simple  to  imple¬ 
ment  and  to  vastly  more  efficient  than  methods 


meat  and  to  vastly  more  efficient  than  methods 
normally  used  to  gather  memory  reference 
data.  This  data  to  only  suited  to  studies  of  pag¬ 
ing  or  higher  level  memory  management,  and 
not  suited  for  studies  of  cache  behavior. 

Data  references  appear  to  significantly 
dominate  the  paging  behavior  of  lane  pro¬ 
grams.  By  comparison,  the  eode  references 
land  to  be  to  a  small  area  of  memory  and  tend 


Dots  references  appear  to  significantly 
dominate  the  paging  behavior  of  lane  pro¬ 
grams.  By  comparison,  the  eode  references 
land  to  be  to  a  small  area  of  memory  and  tend 
to  cover  this  area  very  densely.  Since  data 
refennees  dominate  the  paring  behavior,  this 
implies  that  restructuring  efforts  should  focus 
on  data  restructuring  ana  not  on  eode  restruc¬ 
turing  iFer76). 

The  only  predictive  memory  management 
policy  that  was  sometimes  found  to  be  useful 
was  that  of  sequential  access.  Of  the  programs 
measured,  only  IMAGE  and  SEARCH  (with  files 
coupled  into  the  address  space)  would  be 
amenable  to  prediction.  IMAGE  could  be 
®te*slJled  as  a  multi-segment  sequential  pro- 


Beth  apparently  use  the  division  of  a  matrix 
into  submatrices  to  reduce  the  working  set 
The  other  programs  basically  need  all  pages  in 
mommy  to  got  good  performance.  Some 
reduette®  in  paging  might  occur  if  laigor  pagee 
or  clustering  of  pages  ware  used. 

A  major  result  to  the  absence  of  locality 
changes  m  the  reference  patterns  for  the  pro- 
team  pages  (as  opposed  to  the  data  pages). 


m  to  nave  no 
garbage  eoUec- 
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far  about  five  eecoote,  but  after  thto  there  to  no 
significant  change.  The  conclusion  to  that  with 
the  laager  time  granularity  appropriate  to 
modem  paging  operating  systems,  changes  in 
eode  locality  appear  to  bo  insignificant.  It 
towuld  be  noted  that  the  programs  measured 


ter*  had  not  boon  structured  to  maximize  code 
locality,  but  an  examination  of  the  charts  sug¬ 
gest  that  the  payoff  would  not  have  been  large. 

Ideally,  a  clever  and  simple  policy  should  be 
devised  that  would  detect  all  the  characteris¬ 
tics  discussed  above  and  manage  memory  prop¬ 
erty.  Feats®  that,  it  is  desirable  to  identify 
primitives  that  can  be  provided  by  the  operat¬ 
ing  system  to  allow  a  program  to  hint  at  its 
expected  behavior,  the  standard  VMUNDC 
Operating  System  provided  only  one  such 
opportunity  at  the  time  of  this  study:  a  hint 
could  be  given  that  the  behavior  would  be 
anomalous  and  page  replacement  should 
become  random.  This  was  very  effectively  used 
by  LISP  during  garbage  collection.  An  experi¬ 
mental  version  of  VlluNDC  now  also  allows  the 
hint  that  a  program  is  about  to  exhibit  sequen¬ 
tial  behavior. 

Based  on  the  data  presented  here,  the  fol¬ 
lowing  seems  to  be  a  good  choice  for  primitives. 
Programs  may  be  regular  (have  some  locality), 
random  (no  locality)?  sequential  (ascending  or 
descending)  or  total  (very  inefficient  to  run 
unless  all  pages  can  be  memory  resident).  The 
whole  program  or  only  a  segment  of  the 
address  space  or  only  a  phase  in  the  program's 
lifetime  may  exhibit  the  hinted  behavior. 
Operating  systems  should  be  equipped  to  han¬ 
dle  hints  from  programs  as  to  the  type  of  their 
memory  referencing  behavior.  It  is  expected 
that  such  hints  would  be  provided  only  for 
highly  used  programs  and  for  programs  with 
apodal  performance  problems. 


This  work  was  started  as  a  Joint  class  pro¬ 
ject  with  one  of  the  authors  ana  Douglas  Terry. 
Villism  Joy  did  the  kernel  modiffcatione  and 
provided  much  valuable  advice.  Eric  Cooper 
also  assisted  in  the  kernel  work.  Professor 
Domenico  Ferrari  provided  guidance 
throughout  this  investigation  and  reviewed 
early  copies  of  this  paper. 

Special  thanks  go  to  Steven  Shafer  (WAGE) 
and  John  Jarvis  (SEARCH)  who  contributed 
software  to  bo  moosurod. 
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